Several improvements to build speed and handling of symlinks #19

marmbrus · 2014-12-30T06:29:36Z

Sorry for the pretty large PR, but I was wondering if you would be interested in merging some changes that we have been using. This PR adds the following features:

Image creation can optionally be separated into two parts: A (rarely changing) baseDockerImage, which contains transitive dependencies and other container environment, and a dockerfile which contains only the jars for your project.
SBTs internal caching mechanism is used to augment docker's when building images. This feature can avoid sending lots of data to the docker daemon when nothing has changed
A new command addCompressed is added to DockerFile. This command uses tar czf to create an archive that is shipped to the docker daemon. Unlike the normal add command, this command correctly handles symlinks.

marcus-drake · 2015-01-01T15:20:03Z

Overall I like the changes, but I have a few comments.

Regarding baseDockerImage I rather see that we add support for building any amount of images (in a single sbt project). If they are built in order (and caching is added) your case with a base image would be supported.

The caching support is great. Although I think it is not sufficient to only look at the files in the Dockerfile, we must also take into account:

If the FROM image have been changed (same name but rebuilt or pulled a new one).
If the output image have been removed from the docker daemon. Or if the name/tag have changed.

There is also the case of ADD instructions with URLs as source, but we can probably skip that.

It would be good if there was some task to force a build and skip the cache.

I like that addCompressed supports symlinks. Currently it also preserves file ownership and flags. Perhaps it should be renamed to something that clarifies how it differs from add.
It would be nice if we could use JTar or some other library instead of relying on an external command.

marmbrus · 2015-01-03T18:47:36Z

Thanks for the feedback!

Regarding baseDockerImage I rather see that we add support for building any amount of images (in a single sbt project). If they are built in order (and caching is added) your case with a base image would be supported.

You can kind of do this already (and we actually do in our project) by defining your own taskKey and doing myKey <<= buildDockerImage(myKey). What did you have in mind? Or you would rather we just didn't include the baseDockerImage by default?

If the FROM image have been changed (same name but rebuilt or pulled a new one).

Done. We now canonicalize the From instruction to be the id of the image.

If the output image have been removed from the docker daemon.

Fixed.

Or if the name/tag have changed.

The name is used as the cache file for sbt, so if it changes, the cache will be invalidated.

It would be good if there was some task to force a build and skip the cache.

Added dockerNoCache

I like that addCompressed supports symlinks. Currently it also preserves file ownership and flags. Perhaps it should be renamed to something that clarifies how it differs from add.

Sure, do you have a name you'd prefer? addTar?

It would be nice if we could use JTar or some other library instead of relying on an external command.

Yeah, I could try and do this, but we already fork for the docker command. It seems unlikely that any system that supports docker won't have a fairly standard tar command.

marcus-drake · 2015-01-04T19:01:38Z

I'm on vacation this and next week, will take a look at this when I get
back.
Den 3 jan 2015 19:47 skrev "Michael Armbrust" [email protected]:

Thanks for the feedback!

Regarding baseDockerImage I rather see that we add support for building
any amount of images (in a single sbt project). If they are built in order
(and caching is added) your case with a base image would be supported.

You can kind of do this already (and we actually do in our project) by
defining your own taskKey and doing myKey <<= buildDockerImage(myKey).
What did you have in mind? Or you would rather we just didn't include the
baseDockerImage by default?

If the FROM image have been changed (same name but rebuilt or pulled a new
one).

Done. We now canonicalize the From instruction to be the id of the image.

If the output image have been removed from the docker daemon.

Fixed.

Or if the name/tag have changed.

The name is used as the cache file for sbt, so if it changes, the cache
will be invalidated.

It would be good if there was some task to force a build and skip the
cache.

Added dockerNoCache

I like that addCompressed supports symlinks. Currently it also preserves
file ownership and flags. Perhaps it should be renamed to something that
clarifies how it differs from add.

Sure, do you have a name you'd prefer? addTar?

It would be nice if we could use JTar or some other library instead of
relying on an external command.

Yeah, I could try and do this, but we already fork for the docker
command. It seems unlikely that any system that supports docker won't
have a fairly standard tar command.

—
Reply to this email directly or view it on GitHub
#19 (comment)
.

marcus-drake · 2015-01-15T20:25:27Z

Hi again,

Sorry about the delay. Great changes in the last commit! There are few (minor) things left before I think it can be merged.

I rather see that we remove baseDockerImage completely. Until better support for multiple images have been added you can get the desired behaviour as you described.

Regarding the tar command, as you said it is unlikely that the command does not exists. But I still prefer if we can use JTar since there will be cases where the tar command is missing or works differently.
I can add that support if you don't have time to do it.

Everywhere where you use the docker command, you should use the value from dockerCmd in docker instead. For cases when the user want to customize which docker command to use.

I think that the current solution with naming archives with a randomized name and later changing its corresponding ADD instruction (in the caching process) could have been handled in a better way. I have tried a solution where the naming of files in the docker context is postponed from creation of the instruction to the build phase of the image. I'm still experimenting with it, so lets see how it turns out.

Regarding addCompressed I feel like the value you get out using the function is not that the directory is compressed, but that it keeps the full file tree with symlinks and ownership etc.
How about addTree, addPure or addUnsafe? Naming is hard...

marmbrus · 2015-01-15T20:35:08Z

Awesome, I like addTree. That seems pretty clear to me.

I'll remove baseDockerImage, fix usage with dockerCmd in docker, and port over some fixes I have been using internally.

If you have time to investigate JTar and whatever caching changes you want to make that would be awesome. I can test these both once you have something working.

Improvements to build speed through caching. Add addCompressed function.

9b43830

Code review comments

f2f1f61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several improvements to build speed and handling of symlinks #19

Several improvements to build speed and handling of symlinks #19

marmbrus commented Dec 30, 2014

marcus-drake commented Jan 1, 2015

marmbrus commented Jan 3, 2015

marcus-drake commented Jan 4, 2015

marcus-drake commented Jan 15, 2015

marmbrus commented Jan 15, 2015

Several improvements to build speed and handling of symlinks #19

Are you sure you want to change the base?

Several improvements to build speed and handling of symlinks #19

Conversation

marmbrus commented Dec 30, 2014

marcus-drake commented Jan 1, 2015

marmbrus commented Jan 3, 2015

marcus-drake commented Jan 4, 2015

marcus-drake commented Jan 15, 2015

marmbrus commented Jan 15, 2015